From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0

نویسندگان

  • Jennifer Foster
  • Özlem Çetinoglu
  • Joachim Wagner
  • Joseph Le Roux
  • Joakim Nivre
  • Deirdre Hogan
  • Josef van Genabith
چکیده

We investigate the problem of parsing the noisy language of social media. We evaluate four Wall-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Dynamic Assessment on the Writing Performance of English as Foreign Language Learners in Asynchronous Web 2.0 and Face-to-face Environments

This study sought to investigate dynamic assessment (DA) - an assessment approach that embeds inter- vention within the assessment process and that yields information about the learner’s responsiveness to this intervention - and the writing performance of the second language (L2) learners in Web 2.0 contexts. To this end, pre and post-treatment writings of 45 par...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Implications of News Segments and Movies for Enhancing Listening Comprehension of Language Learners

Abstract Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audiovisual programs in improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermedia...

متن کامل

Implications of News Segments and Movies for Enhancing Listening Comprehension of Language Learners

Abstract Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audiovisual programs in improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermedia...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011